12
Introduction
FIGURE 1.7
The generation of CiF.
Martinez et al. [168] attempt to minimize the discrepancy between the binary output and
the corresponding real-valued convolution. They proposed real-to-binary attention matching
suited for training 1-bit CNNs. They also devised an approach in which the architectural gap
between real and binary networks is progressively bridged through a sequence of teacher-
student pairs.
Instead of using a pre-trained full-precision model, Bethge et al. [11] directly train a
binary network from scratch, which does not benefit from other standard methods. Their
implementation is based on the BMXNet framework [268].
Helwegen et al. [85] believe that latent weights with real values cannot be treated anal-
ogously to weights in real-valued networks, while their primary role is to provide inertia
during training. They introduced the Binary Optimizer (Bop), the first optimizer designed
for BNNs.
BinaryDuo [115] proposes a new training scheme for binary activation networks in which
two binary activations are coupled into a ternary activation during training. They first
decouple a ternary activation into two binary activations. Then the number of weights is
doubled after decoupling. They reduce the coupled ternary model to match the parameter
size of the decoupled model and the baseline model. They update each weight independently
to find a better value since the two weights no longer need to share the same value.
BENN [301] uses classical ensemble methods to improve the performance of 1-bit CNNs.
While ensemble techniques have been broadly believed to be only marginally helpful for
strong classifiers, such as deep neural networks, their analysis, and experiments show that
they are naturally a perfect fit to boost BNNs. The main uses of the ensemble strategies
are shown in [19, 32, 184].
TentacleNet [173] is also inspired by the theory of ensemble learning. Compared to
BENN [301], TentacleNet takes a step forward, showing that binary ensembles can reach
high accuracy with fewer resources.
BayesBiNN [170] uses a distribution over the binary variable, resulting in a principled
approach to discrete optimization. They used a Bernoulli approximation to the posterior
and estimated it using the Bayesian learning rule proposed in [112].
1.2
Applications
The success of BNNs makes it possible to apply deep learning models to edge computing.
Neural network models have been used in various real tasks with the help of these binary
methods, including image classification, image classification, speech recognition, and object
detection and tracking.